Feature selection for high-dimensional classification using a competitive swarm optimizer

نویسندگان

  • Shenkai Gu
  • Ran Cheng
  • Yaochu Jin
چکیده

When solving many machine learning problems such as classification, there exists a large number of input features. However, not all features are relevant for solving the problem, and sometimes, including irrelevant features may deteriorate the learning performance. Therefore, it is essential to select the most relevant features, which is known as feature selection. Many feature selection algorithms have been developed, including evolutionary algorithms or particle swarm optimization (PSO) algorithms, to find a subset of the most important features for accomplishing a particular machine learning task. However, the traditional PSO does not perform well for large scale optimization problems, which degrades the effectiveness of PSO for feature selection when the number of features dramatically increases. In this paper, we propose to use a very recent PSO variant, known as competitive swarm optimizer (CSO) that was dedicated to large-scale optimization, for solving high-dimensional feature selection problems. In addition, the CSO, which was originally developed for continuous optimization, is adapted to performing feature selection that can be considered as a combinatorial optimization problem. An archive technique is also introduced to reduce computational cost. Experiments on six benchmark datasets demonstrate that compared to the canonical PSO based and a state-of-the-art PSO variant for feature selection, the proposed CSO-based Shenkai Gu Department of Computer Science, University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom Ran Cheng Department of Computer Science, University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom Yaochu Jin 1 Department of Computer Science, University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom 2 School of Management Science and Engineering, Dalian University of Technology, Dalian, China; 116023 E-mail: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble of Filter-Based Rankers to Guide an Epsilon-Greedy Swarm Optimizer for High-Dimensional Feature Subset Selection

The main purpose of feature subset selection is to remove irrelevant and redundant features from data, so that learning algorithms can be trained by a subset of relevant features. So far, many algorithms have been developed for the feature subset selection, and most of these algorithms suffer from two major problems in solving high-dimensional datasets: First, some of these algorithms search in...

متن کامل

A Modified Grey Wolf Optimizer by Individual Best Memory and Penalty Factor for Sonar and Radar Dataset Classification

Meta-heuristic Algorithms (MA) are widely accepted as excellent ways to solve a variety of optimization problems in recent decades. Grey Wolf Optimization (GWO) is a novel Meta-heuristic Algorithm (MA) that has been generated a great deal of research interest due to its advantages such as simple implementation and powerful exploitation. This study proposes a novel GWO-based MA and two extra fea...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Determining Effective Features for Face Detection Using a Hybrid Feature Approach

Detecting faces in cluttered backgrounds and real world has remained as an unsolved problem yet. In this paper, by using composition of some kind of independent features and one of the most common appearance based approaches, and multilayered perceptron (MLP) neural networks, not only some questions have been answered, but also the designed system achieved better performance rather than the pre...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Soft Comput.

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2018